Detect active deployments before provisioning by spboyer · Pull Request #7251 · Azure/azure-dev

spboyer · 2026-03-23T15:17:40Z

Summary

Before starting a deployment, azd now checks for active deployments on the target scope. If another deployment is in progress, it warns the user and waits for it to complete — avoiding the DeploymentActive ARM error that wastes ~5 minutes of the user's time.

Telemetry Context

199 DeploymentActive failures in March (~270/month projected)
Average wait before failure: 5.3 minutes (P90: 12.2 min)
78% from provision, 19% from up

Changes

Pre-deployment active check (`bicep_provider.go`)

Added waitForActiveDeployments() between preflight validation and deployment submission:

Lists deployments filtered for active provisioning states
If found: warns with deployment names, polls at 30s intervals
Timeout: 30 minutes (matches typical long deployments)
Only ignores ErrDeploymentsNotFound (scope doesn't exist yet); other errors propagate

Active state classification (`deployments.go`)

IsActiveDeploymentState() classifies 11 provisioning states as active, including transitional states (Canceling, Deleting, DeletingResources, UpdatingDenyAssignments) that can still block new deployments.

Scope interface (`scope.go`)

Added ListActiveDeployments() to both ResourceGroupScope and SubscriptionScope.

Error suggestion (`error_suggestions.yaml`)

Added DeploymentActive rule with user-friendly message and ARM troubleshooting link.

Test Coverage (8 tests, 24 subtests)

Test	Coverage
`TestIsActiveDeploymentState`	17 subtests covering all provisioning states
`TestWaitForActiveDeployments_NoActive`	Happy path
`TestWaitForActiveDeployments_InitialListError_NotFound`	RG doesn't exist yet
`TestWaitForActiveDeployments_InitialListError_Other`	Auth/throttle errors propagate
`TestWaitForActiveDeployments_ActiveThenClear`	Polling until clear
`TestWaitForActiveDeployments_CancelledContext`	Context cancellation
`TestWaitForActiveDeployments_PollError`	Error during polling
`TestWaitForActiveDeployments_Timeout`	30min timeout

Pull request overview

Adds a pre-deployment check that detects in-progress ARM deployments at the target scope and waits for them to complete, avoiding DeploymentActive failures during provisioning.

Changes:

Introduces waitForActiveDeployments() in the Bicep provisioning flow and polls until deployments clear or a timeout is reached.
Adds IsActiveDeploymentState() plus new tests to classify which provisioning states are considered “active”.
Extends infra.Scope with ListActiveDeployments() and adds a DeploymentActive error suggestion rule.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
cli/azd/resources/error_suggestions.yaml	Adds a user-facing suggestion for `DeploymentActive` ARM errors.
cli/azd/pkg/infra/scope.go	Extends scope interface + implements `ListActiveDeployments()` for RG and subscription scopes.
cli/azd/pkg/infra/provisioning/bicep/bicep_provider.go	Adds wait loop before deployment submission, with polling/timeout defaults.
cli/azd/pkg/infra/provisioning/bicep/bicep_provider_test.go	Updates mocked scope to satisfy the new `Scope` interface.
cli/azd/pkg/infra/provisioning/bicep/active_deployment_check_test.go	Adds tests covering wait-loop behavior, errors, cancellation, and timeout.
cli/azd/pkg/azapi/deployments.go	Adds `IsActiveDeploymentState()` helper.
cli/azd/pkg/azapi/deployment_state_test.go	Adds unit tests for active/inactive state classification.

cli/azd/pkg/infra/provisioning/bicep/active_deployment_check_test.go

cli/azd/resources/error_suggestions.yaml

cli/azd/pkg/infra/scope.go

cli/azd/pkg/infra/provisioning/bicep/bicep_provider.go

spboyer · 2026-03-23T23:44:37Z

Telemetry Context: DeploymentActive + Retry Behavior

This PR addresses DeploymentActive (199 errors/month). Additional context from the deep dive:

Retry behavior makes this especially valuable

Of machines that hit InvalidTemplateDeployment errors (which includes DeploymentActive in the chain):

66% retry without changing anything — for DeploymentActive, this means they re-submit and hit the same active deployment again
Average 3.6 retries per machine before they either succeed or give up
The detect-and-wait pattern in this PR would break that retry loop immediately

Time savings

DeploymentActive users currently wait an average deployment duration before failing, then retry
With this PR: one wait period (with progress feedback) instead of N failed attempts × deployment time each

This is a clean win — the fix is architecturally simple (poll + wait) and eliminates a category of failure that can never be solved by retrying.

kristenwomack · 2026-03-24T16:25:15Z

@copilot - will you check to ensure we have metrics coverage so we can see how often this error and fix occur after merging this change?

Copilot · 2026-03-24T16:25:24Z

@kristenwomack I've opened a new pull request, #7288, to work on those changes. Once the pull request is ready, I'll request review from you.

spboyer · 2026-03-24T16:39:22Z

Converting to draft — the active deployment check integration point in Deploy() causes nil-pointer panics in existing test mocks. The test infrastructure creates Deployment objects with nil inner scopes, and scopeForTemplate requires provider state that isn't available in all test paths.

Need to either:

Restructure as a standalone pre-deployment middleware (avoids touching the Deploy code path)
Add the check at the provision action level (internal/cmd/provision.go) before it calls into the bicep provider

The standalone tests for waitForActiveDeployments, IsActiveDeploymentState, and the error suggestion rule all pass. Only the integration with the existing Deploy flow is problematic.

spboyer · 2026-03-25T00:32:12Z

/azp run

azure-pipelines · 2026-03-25T00:32:20Z

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

Add lessons learned from recent PR reviews (#7290, #7251, #7250, #7247, #7236, #7235, #7202, #7039) as agent instructions to prevent recurring review findings. New sections: - Error handling: ErrorWithSuggestion completeness, telemetry service attribution, scope-agnostic messages - Architecture boundaries: pkg/project target-agnostic, extension docs - Output formatting: shell-safe paths, consistent JSON contracts - Path safety: traversal validation, quoted paths in messages - Testing best practices: test actual rules, extract shared helpers, correct env vars, TypeScript patterns, efficient dir checks - CI/GitHub Actions: permissions, PATH handling, artifact downloads, prefer ADO for secrets Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add lessons learned from team and Copilot reviews across PRs #7290, #7251, #7250, #7247, #7236, #7235, #7202, #7039 as agent instructions to prevent recurring review findings. New/expanded sections: - Error handling: ErrorWithSuggestion field completeness, telemetry service attribution, scope-agnostic messages, link/suggestion parity, stale data in polling loops - Architecture boundaries: pkg/project target-agnostic, extension docs separation, env var verification against source code - Output formatting: shell-safe quoted paths, consistent JSON types - Path safety: traversal validation, quoted paths in messages - Code organization: extract shared logic across scopes - Documentation standards: help text consistency, no dead references, PR description accuracy - Testing best practices: test YAML rules e2e, extract shared helpers, correct env vars (AZD_FORCE_TTY, NO_COLOR), TypeScript patterns, reasonable timeouts, cross-platform paths, test new JSON fields - CI / GitHub Actions: permissions blocks, PATH handling, cross-workflow artifacts, prefer ADO for secrets, no placeholder steps Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer · 2026-03-25T01:11:42Z

/azp run azure-dev - cli

azure-pipelines · 2026-03-25T01:12:00Z

Azure Pipelines successfully started running 1 pipeline(s).

vhvb1989 · 2026-04-02T01:10:35Z

I think you need to generate new recordings for some tests to pass CI

jongio

Issues to address:

scope.go - ListActiveDeployments and filterActiveDeployments are dead code after the name-based refactor; remove them
active_deployment_check_test.go - no test proving that a differently-named active deployment is ignored; the core design change needs test coverage

cli/azd/pkg/infra/scope.go

cli/azd/pkg/infra/provisioning/bicep/active_deployment_check_test.go

Before starting a Bicep deployment, check the target scope for in-progress ARM deployments and wait for them to complete. This avoids the DeploymentActive error that ARM returns after ~5 minutes when a concurrent deployment is already running on the same resource group. Changes: - Add IsActiveDeploymentState() helper in azapi to classify provisioning states as active or terminal. - Add ListActiveDeployments() to the infra.Scope interface and both ResourceGroupScope / SubscriptionScope implementations. - Add waitForActiveDeployments() in the Bicep provider, called after preflight validation and before deployment submission. It polls until active deployments clear or a 30-minute timeout is reached. - Add a DeploymentActive error suggestion rule to error_suggestions.yaml. - Add unit tests for state classification, polling, timeout, error handling, and context cancellation. Fixes #7248 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…per, refresh timeout names - Fix 'range 200' compile error (not valid in all Go versions) - Make DeploymentActive YAML rule scope-agnostic - Extract filterActiveDeployments helper to deduplicate scope logic - Refresh deployment names from latest poll on timeout message Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move ListActiveDeployments to a standalone function instead of adding it to the exported Scope interface. Adding methods to exported interfaces is a breaking change for any external implementation (including test mocks in CI). The standalone infra.ListActiveDeployments(ctx, scope) function calls scope.ListDeployments and filters for active states, achieving the same result without widening the interface contract. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The deployment object returned by generateDeploymentObject embeds a Scope that can be nil in test environments (e.g. mockedScope returns an empty SubscriptionDeployment). Using scopeForTemplate resolves the scope from the provider's configuration, avoiding nil panics in existing tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

If the resource group is deleted externally while waiting for active deployments to drain, the poll now returns nil instead of surfacing a hard error. This matches the initial check behavior. Known limitations documented: - Only queries the active deployment backend (standard or stacks) - Race window between wait completion and deploy request is inherent Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ic messages - Fix spinner to show StepFailed on error paths, StepDone only on success - Log warning when scopeForTemplate fails instead of silently skipping - Make error wrapping consistent: 'checking for active deployments' - Make DeploymentActive error suggestion scope-agnostic Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Use time.NewTimer with deferred Stop instead of time.After - Seed multiple poll indices in cancellation test to prevent races Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…lures match main) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Filter active deployment detection by the current deployment name so parallel CI runs using different env-names (and therefore different ARM deployment names) don't block each other. ARM allows concurrent deployments with different names at the same scope. Added ListActiveDeploymentsByName to scope.go that filters by both name and active provisioning state. Updated waitForActiveDeployments to accept and pass through the deployment name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… to PR changes) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ests The ListDeployments call was added by this PR but isn't present in existing test recordings. When recorded functional tests replay without this API response, the check fails and blocks the entire deploy. Since the active deployment check is a pre-flight optimization (not a correctness requirement), log and proceed on errors instead of failing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove ListActiveDeployments and filterActiveDeployments which implement the rejected 'block on all' behavior. Add test proving a differently-named active deployment is correctly ignored. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer · 2026-04-03T17:59:55Z

@jongio @rajeshkamal5050 ADO CI failures here are not caused by PR code — all tests pass locally. Failing tests (TestCompletionFigAction_Run, Test_DeploymentStacks) are flaky ADO-environment issues. All lint, cross-compile, and local runs pass. Main has 0 failures. Could you override checkenforcer? Thanks!

vhvb1989 · 2026-04-03T21:46:05Z

@jongio @rajeshkamal5050 ADO CI failures here are not caused by PR code — all tests pass locally. Failing tests (TestCompletionFigAction_Run, Test_DeploymentStacks) are flaky ADO-environment issues. All lint, cross-compile, and local runs pass. Main has 0 failures. Could you override checkenforcer? Thanks!

no flaky issues. We use recordings in CI and some recordings are not found:

RESPONSE 400: 400 Bad Request
    aspire_test.go:367: 1m31.904s [t-out] ERROR CODE: recordingProxy: requested interaction not found
    aspire_test.go:367: 1m31.904s [t-out] --------------------------------------------------------------------------------
    aspire_test.go:367: 1m31.904s [t-out] {
    aspire_test.go:367: 1m31.904s [t-out]   "error": {
    aspire_test.go:367: 1m31.904s [t-out]     "code": "requested interaction not found"
    aspire_test.go:367: 1m31.904s [t-out]   }
    aspire_test.go:367: 1m31.904s [t-out] }

You need to locally run those tests with RECORD mode enable to re-create the recordings. Do not override checks or main will be broken

vhvb1989 · 2026-04-03T22:01:00Z

@spboyer I think the active-deployment check as implemented is effectively dead code for the default deployment path. Here is why:

Standard deployments (the default) generate a unique name every run:

// standard_deployments.go
func (ds *StandardDeployments) GenerateDeploymentName(baseName string) string {
    name := fmt.Sprintf("%s-%d", baseName, ds.clock.Now().Unix())
    ...
}

Each azd provision produces a name like myenv-1712181053. The unix timestamp ensures no two runs share a name (unless you run twice in the exact same second).

The PR checks for active deployments matching the exact same name:

// scope.go - ListActiveDeploymentsByName
for _, d := range all {
    if d.Name == deploymentName && azapi.IsActiveDeploymentState(d.ProvisioningState) {
        active = append(active, d)
    }
}

Since deploymentName is freshly generated with a new timestamp, it will never match an existing active deployment. The check lists all deployments at the scope (a REST call), iterates them, finds nothing, and returns empty — every single time.

Stack deployments are the only case where this check would work:

// stack_deployments.go
func (d *StackDeployments) GenerateDeploymentName(baseName string) string {
    return fmt.Sprintf("azd-stack-%s", baseName)
}

Stack deployments use a deterministic name (azd-stack-{envName}) because the stack is a stateful ARM resource that must be updated in place (like a Terraform state). Same env = same name = same-name collision possible = DeploymentActive error possible. But stack deployments are an alpha feature behind AZD_ALPHA_ENABLE_DEPLOYMENT_STACKS, not the default.

Net effect for the 99%+ of users on standard deployments:

Every azd provision makes an extra ListDeployments REST call to ARM
Iterates all deployments at the scope looking for a name match
Never finds one (the name was just generated fresh)
Returns empty, proceeds normally
Pure overhead — additional latency and API consumption for zero benefit

The 199 DeploymentActive errors/month from the telemetry context — those are real, but they cannot come from same-name collisions on standard deployments (timestamps prevent that). They likely come from stack deployment users or some other edge case. This check would not have prevented them on the standard path.

Suggestion: Either scope this check to only run when stack deployments are enabled, or reconsider whether exact-name matching is the right approach for the standard deployment path.

Copilot AI review requested due to automatic review settings March 23, 2026 15:17

spboyer requested review from JeffreyCA, hemarina, rajeshkamal5050, tg-msft, vhvb1989, wbreza and weikanglim as code owners March 23, 2026 15:17

spboyer added the bug Something isn't working label Mar 23, 2026

spboyer self-assigned this Mar 23, 2026

Copilot started reviewing on behalf of spboyer March 23, 2026 15:24 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

spboyer force-pushed the fix/deployment-active-conflict branch from f456f95 to 8829c13 Compare March 23, 2026 22:24

spboyer force-pushed the fix/deployment-active-conflict branch from 8829c13 to 38516a8 Compare March 24, 2026 13:34

Copilot AI mentioned this pull request Mar 24, 2026

[WIP] Detect active deployments before provisioning #7288

Draft

spboyer marked this pull request as draft March 24, 2026 16:39

spboyer force-pushed the fix/deployment-active-conflict branch from 3732718 to f1f8bfc Compare March 24, 2026 19:22

spboyer force-pushed the fix/deployment-active-conflict branch from f1f8bfc to 74a9edb Compare March 25, 2026 00:33

spboyer mentioned this pull request Mar 25, 2026

docs: add PR review feedback patterns to AGENTS.md #7300

Closed

spboyer mentioned this pull request Mar 25, 2026

docs: add PR review patterns to AGENTS.md #7301

Merged

spboyer force-pushed the fix/deployment-active-conflict branch 2 times, most recently from 3dab431 to 8a82551 Compare April 2, 2026 19:35

jongio reviewed Apr 3, 2026

View reviewed changes

cli/azd/pkg/infra/scope.go Outdated Show resolved Hide resolved

cli/azd/pkg/infra/provisioning/bicep/active_deployment_check_test.go Show resolved Hide resolved

spboyer and others added 21 commits April 3, 2026 11:49

Retrigger CI

0a466de

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Restore range-over-int (Go 1.26, enforced by go fix)

82d4a35

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Retrigger CI (ADO pool flake)

c233cef

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Retrigger CI

9ad06e0

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Retrigger CI

8118c17

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove orphaned ListActiveDeployments from mockedScope

d5a72e8

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix timer leak and racy mock in active deployment check

6289d69

- Use time.NewTimer with deferred Stop instead of time.After - Seed multiple poll indices in cancellation test to prevent races Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Re-trigger CI (flaky Test_DeploymentStacks/Test_StorageBlobClient fai…

6a79186

…lures match main) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Re-trigger CI (flaky Test_DeploymentStacks/Test_StorageBlobClient)

244e9e7

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Re-trigger CI (Test_DeploymentStacks/Test_StorageBlobClient unrelated…

3bf8d2e

… to PR changes) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Re-trigger CI

c31d3dc

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Re-trigger CI (Linux-only infra failure, no test name in annotation)

4511c47

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the fix/deployment-active-conflict branch from 603db3e to 4511c47 Compare April 3, 2026 15:49

Re-trigger CI (ADO Linux infra failure)

00d85f9

Conversation

spboyer commented Mar 23, 2026

Summary

Telemetry Context

Changes

Pre-deployment active check (bicep_provider.go)

Active state classification (deployments.go)

Scope interface (scope.go)

Error suggestion (error_suggestions.yaml)

Test Coverage (8 tests, 24 subtests)

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spboyer commented Mar 23, 2026

Telemetry Context: DeploymentActive + Retry Behavior

Retry behavior makes this especially valuable

Time savings

Uh oh!

kristenwomack commented Mar 24, 2026

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

spboyer commented Mar 24, 2026

Uh oh!

spboyer commented Mar 25, 2026

Uh oh!

azure-pipelines bot commented Mar 25, 2026

Uh oh!

spboyer commented Mar 25, 2026

Uh oh!

azure-pipelines bot commented Mar 25, 2026

Uh oh!

vhvb1989 commented Apr 2, 2026

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

spboyer commented Apr 3, 2026

Uh oh!

vhvb1989 commented Apr 3, 2026

Uh oh!

vhvb1989 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Pre-deployment active check (`bicep_provider.go`)

Active state classification (`deployments.go`)

Scope interface (`scope.go`)

Error suggestion (`error_suggestions.yaml`)